Skip to content

Refactor JIT kernel CI to use run_suite.py registration system#21239

Merged
merrymercy merged 3 commits intomainfrom
lianmin/final-ci-cleanup
Mar 24, 2026
Merged

Refactor JIT kernel CI to use run_suite.py registration system#21239
merrymercy merged 3 commits intomainfrom
lianmin/final-ci-cleanup

Conversation

@merrymercy
Copy link
Copy Markdown
Contributor

Summary

  • Migrate pr-test-jit-kernel.yml from raw pytest/shell invocation to the centralized run_suite.py + register_cuda_ci registration system
  • Introduce three new suites: stage-b-kernel-unit-1-gpu-large, stage-b-kernel-benchmark-1-gpu-large, nightly-kernel-1-gpu
  • Add register_cuda_ci calls to all 35 test files and 23 benchmark files under python/sglang/jit_kernel/
  • Extend run_suite.py glob to discover JIT kernel tests and benchmarks alongside test/registered/
  • Fix missing __main__ guards in test_nvfp4_*.py files, fix test_custom_all_reduce.py to dispatch between torchrun worker and pytest

Test plan

  • Verify jit-kernel-unit-test job discovers and runs all unit tests via run_suite.py --hw cuda --suite stage-b-kernel-unit-1-gpu-large
  • Verify jit-kernel-benchmark-test job discovers and runs all benchmarks via run_suite.py --hw cuda --suite stage-b-kernel-benchmark-1-gpu-large
  • Verify jit-kernel-unit-test-nightly job runs full tests via run_suite.py --hw cuda --suite nightly-kernel-1-gpu --nightly
  • Verify disabled tests (test_custom_all_reduce, bench_custom_all_reduce, bench_norm_impls) are correctly skipped

Made with Cursor

Migrate pr-test-jit-kernel.yml from raw pytest/shell invocation to the
centralized run_suite.py + register_cuda_ci system, introducing three
new suites:

- stage-b-kernel-unit-1-gpu-large (per-commit unit tests)
- stage-b-kernel-benchmark-1-gpu-large (per-commit benchmarks)
- nightly-kernel-1-gpu (nightly full tests)

Changes:
- Add register_cuda_ci calls to all 35 test files and 23 benchmark files
- Extend run_suite.py glob to discover jit_kernel tests/benchmarks
- Register new suites in PER_COMMIT_SUITES and NIGHTLY_SUITES
- Fix missing __main__ guards in test_nvfp4_*.py files
- Fix test_custom_all_reduce.py __main__ to handle both torchrun
  worker and direct pytest invocation
- Disable multi-GPU and self-skipping tests/benchmarks with reasons

Made-with: Cursor
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added quant LLM Quantization lora hicache Hierarchical Caching for SGLang blackwell SM100/SM120 jit-kernel labels Mar 23, 2026
@merrymercy
Copy link
Copy Markdown
Contributor Author

/tag-and-rerun-ci

… logs

- Remove jit-kernel-unit-test-nightly from pr-test-jit-kernel.yml; add
  nightly-test-kernel-1-gpu-h100 to nightly-test-nvidia.yml with matching
  env (SGLANG_JIT_KERNEL_RUN_FULL_TESTS, SGLANG_JIT_DEEPGEMM_FAST_WARMUP,
  SGLANG_PR_TEST_BYPASS_MAINTENANCE_ON_MAIN) and workflow_dispatch filter.
- Refresh register_cuda_ci est_time values using observed wall times from
  PR #21239 run 23465359984: jit-kernel-unit-test (68277933098) and
  jit-kernel-benchmark-test (68277933115). Per-commit times use ~1.15x
  elapsed; nightly uses 4x stage-b (min 120, cap 1200) except FA4
  (120/900; FA4 elapsed is a skip on H100).

Made-with: Cursor
@merrymercy merrymercy merged commit 260abe1 into main Mar 24, 2026
62 of 129 checks passed
@merrymercy merrymercy deleted the lianmin/final-ci-cleanup branch March 24, 2026 04:17
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blackwell SM100/SM120 hicache Hierarchical Caching for SGLang jit-kernel lora quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant